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Abstract 

Recently,  statistical  machine  translation  models  have  begun  to  take  advantage  of  higher 
level  linguistic  structures  such  as  syntactic  dependencies.  Underlying  these  models  is  an 
assumption  about  the  directness  of  translational  correspondence  between  sentences  in  the 
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glish  to  Chinese.  Our  results  show  that  although  the  direct  correspondence  assumption 
is  often  too  restrictive,  a  small  set  of  principled,  elementary  linguistic  transformations 
can  boost  the  quality  of  the  projected  Chinese  parses  by  76%  relative  to  the  unimproved 
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Abstract 

Recently,  statistical  machine  transla¬ 
tion  models  have  begun  to  take  advan¬ 
tage  of  higher  level  linguistic  structures 
such  as  syntactic  dependencies.  Un¬ 
derlying  these  models  is  an  assumption 
about  the  directness  of  translational 
correspondence  between  sentences  in 
the  two  languages;  however,  the  extent 
to  which  this  assumption  is  valid  and 
useful  is  not  well  understood.  In  this 
paper,  we  present  an  empirical  study 
that  quantifies  the  degree  to  which  syn¬ 
tactic  dependencies  are  preserved  when 
parses  are  projected  directly  from  En¬ 
glish  to  Chinese.  Our  results  show  that 
although  the  direct  correspondence  as¬ 
sumption  is  often  too  restrictive,  a 
small  set  of  principled,  elementary  lin¬ 
guistic  transformations  can  boost  the 
quality  of  the  projected  Chinese  parses 
by  76%  relative  to  the  unimproved 
baseline. 

1  Introduction 

Advances  in  statistical  parsing  and  language 
modeling  have  shown  the  importance  of  mod¬ 
eling  grammatical  dependencies  (i.e.,  relation¬ 
ships  between  syntactic  heads  and  their  mod¬ 
ifiers)  between  words  (Collins,  1997;  Eisner, 
1997;  Chelba  and  Jelinek,  1998;  Charniak, 
2001).  Informed  by  the  insights  of  this  work,  re¬ 
cent  statistical  machine  translation  (MT)  mod¬ 
els  have  become  linguistically  richer  in  their  rep¬ 
resentation  of  monolingual  relationships  than 


their  predecessors  ((Wu,  1995;  Alshawi  et  al., 
2000;  Yarnada  and  Knight,  2001);  cf.  (Brown  et 
al.,  1990;  Brown  et  al.,  1993)). 

Using  richer  monolingual  representations  in 
statistical  MT  raises  the  challenge  of  how  to 
characterize  the  cross-language  relationship  be¬ 
tween  two  sets  of  monolingual  syntactic  rela¬ 
tions.  In  this  paper,  we  investigate  a  character¬ 
ization  that  often  appears  implicitly  as  a  part 
of  newer  statistical  MT  models,  which  we  term 
the  direct  correspondence  assumption  (DC A). 
Intuitively,  the  assumption  is  that  for  two  sen¬ 
tences  in  parallel  translation,  the  syntactic  rela¬ 
tionships  in  one  language  directly  map  to  the 
syntactic  relationships  in  the  other.  Since 
it  has  not  been  described  explicitly,  the  valid¬ 
ity  and  utility  of  the  DCA  are  not  well  un¬ 
derstood  —  although,  without  identifying  the 
DCA  as  such,  other  translation  researchers  have 
nonetheless  found  themselves  working  around  its 
limitations.1 

In  Section  2  we  show  how  the  DCA  appears 
implicitly  in  several  models,  providing  an  ex¬ 
plicit  formal  statement,  and  we  discuss  its  po¬ 
tential  inadequacies.  In  Section  3,  we  provide 
a  way  to  assess  empirically  the  extent  to  which 
the  DCA  holds  true.  Our  results  suggest  that  al¬ 
though  the  DCA  is  too  restrictive  in  many  cases, 
a  general  set  of  principled,  elementary  linguistic 
transformations  can  often  resolve  the  problem. 

1For  example,  Yamada  and  Knight  (2001)  account 
for  non-DCA-respecting  variation  by  learning  construc¬ 
tion  specific  local  transformations  on  constituency  trees. 
There  also  exists  a  substantial  literature  in  transfer- 
based  MT  on  learning  mapping  patterns  for  syntactic 
relationships  that  do  not  correspond  (e.g.,  (Menezes  and 
Richardson,  2001;  Lavoie  et  al.,  2001)). 


In  Section  4,  we  consider  the  implications  of  our 
experimental  results  and  discuss  future  work. 

2  The  Direct  Correspondence 
Assumption 

To  our  knowledge,  the  direct  correspondence  as¬ 
sumption  underlies  all  statistical  models  that  at¬ 
tempt  to  capture  a  relationship  between  syntac¬ 
tic  structures  in  two  languages,  be  they  con¬ 
stituent  models  or  dependency  models.  As 
an  example  of  the  former,  consider  Wu’s 
(1995)  stochastic  inversion  transduction  gram¬ 
mar  (SITG),  in  which  paired  sentences  are  si¬ 
multaneously  generated  using  context-free  rules; 
word  order  differences  are  accounted  for  by 
allowing  each  rule  to  be  read  in  a  left-to- 
right  or  right-to-left  fashion,  depending  on 
the  language.  For  example,  SITG  can  gen¬ 
erate  verb  initial  (English)  and  verb  final 
(Japanese)  verb  phrases  using  the  same  rule 
VP  — >  V  NP.  For  any  derivation  using  this 
rule,  if  V e  and  np^  are  the  English  verb 
and  noun  phrase,  and  they  are  respectively 
aligned  with  Japanese  verb  and  noun  phrase 
Vj  and  NP j,  then  verb-object(Ve,  np^)  and 
VERB-object(Vj,  NPj)  must  both  be  true. 

As  an  example  where  the  DCA  relates 
dependency  structures,  consider  the  hier¬ 
archical  alignment  algorithm  proposed  by 
Alshawi  et  al.  (2000).  In  this  framework,  word- 
level  alignments  and  paired  dependency  struc¬ 
tures  are  constructed  simultaneously.  The 
English-Basque  example  (1)  illustrates:  if  the 
English  word  buy  is  aligned  to  the  Basque  word 
erosi  and  gift  is  aligned  to  opari,  the  creation 
of  the  head-modifier  relationship  between  buy 
and  gift  is  accompanied  by  the  creation  of  a  cor¬ 
responding  head-modifier  relationship  between 
erosi  and  opari. 

(1)  a.  I  got  a  gift  for  my  brother 

b.  Nik  (i)  nire  (my)  anaiari  (brother- 
dat)  opari  (gift)  bat  (a)  erosi  (buy) 
nion  (past) 

2.1  Formalizing  the  DCA 

Let  us  formalize  this  intuitive  idea  about  corre¬ 
sponding  syntactic  relationships  in  the  following 
more  general  way: 


Direct  Correspondence  Assumption 
(DCA):  Given  a  pair  of  sentences  E  and  F 
that  are  (literal)  translations  of  each  other  with 
syntactic  structures  TreeE  and  TreeF,  if  nodes 
xe  and  yE  of  TreeE  are  aligned  with  nodes  xf 
and  yF  of  TreeF-,  respectively,  and  if  syntactic 
relationship  R(xE,yE)  holds  in  TreeE,  then 
R(xF,yF )  holds  in  TreeF- 

Here,  R{x,  y)  may  specify  a  head-modifier 
relationship  between  words  in  a  dependency 
tree,  or  a  sisterhood  relationship  between  non¬ 
terminals  in  a  constituency  tree.  As  stated,  the 
DCA  amounts  to  an  assumption  that  the  cross¬ 
language  alignment  resembles  a  homomorphism 
relating  the  syntactic  graph  of  E  to  the  syntactic 
graph  of  F? 

Wu’s  SITG  makes  this  assumption,  under  the 
interpretation  that  R  is  the  head-modifier  re¬ 
lation  expressed  in  a  rewrite  rule.  The  IBM 
MT  models  (Brown  et  al.,  1993)  do  not  re¬ 
spect  the  DCA,  but  neither  do  they  attempt  to 
model  any  higher  level  syntactic  relationship  be¬ 
tween  constituents  within  or  across  languages — 
the  translation  model  (alignments)  and  the  lan¬ 
guage  model  are  statistically  independent.  In 
Yarnada  and  Knight’s  (2001)  extension  of  the 
IBM  models,  on  the  other  hand,  grammatical 
information  from  the  source  language  is  prop¬ 
agated  into  the  noisy  channel,  and  the  gram¬ 
matical  transformations  in  their  channel  model 
appear  to  respect  direct  correspondence.3  The 
simultaneous  parsing  and  alignment  algorithm 
of  Alshawi  et  al.  (2000)  is  essentially  an  imple¬ 
mentation  of  the  DCA  in  which  relationship  R 
has  no  linguistic  import  (i.e.  anything  can  be  a 
head) . 


2  Some  models  embody  a  stronger  version  of  the  DCA 
that  more  closely  resembles  an  isomorphism  between  de¬ 
pendency  graphs(Shieber,  1994),  though  we  will  not  pur¬ 
sue  this  idea  further  here. 

3  Knight  and  Yarnada  actually  pre-process  the  English 
input  in  cases  that  most  transparently  violate  direct  cor¬ 
respondence;  for  example,  they  permute  English  verbs  to 
sentence-final  position  in  the  model  transforming  English 
into  Japanese.  Most  models  we  looked  at  have  addressed 
some  effects  of  DCA  failure,  but  they  have  not  acknowl¬ 
edged  it  explicitly  as  an  underlying  assumption,  nor  have 
they  gone  beyond  expedient  measures  to  the  type  of  prin¬ 
cipled  analysis  that  we  propose  below. 


R 

^Eng 

^Eng 

^Bsq 

VBsq 

verb-subj 

got 

I 

erosi 

nik 

verb-obj 

got 

gift 

erosi 

opari 

noun-det 

gift 

a 

opari 

bat 

noun-mod 

brother 

my 

anaiari 

nire 

Table  1:  Correspondences  preserved  in  (1) 


2.2  Problems  with  the  DCA 

The  DCA  seems  to  be  a  reasonable  principle,  es¬ 
pecially  when  expressed  in  terms  of  syntactic  de¬ 
pendencies  that  abstract  away  word  order.  That 
is,  the  thematic  (who-did-what-to-whom)  rela¬ 
tionships  are  likely  to  hold  true  across  transla¬ 
tions  even  for  typologically  different  languages. 
Consider  example  (1)  again:  despite  the  fact 
that  the  Basque  sentence  has  a  different  word 
order,  with  the  verb  appearing  at  the  far  right 
of  the  sentence,  the  syntactic  dependency  rela¬ 
tionships  of  English  (subject,  object,  noun  mod¬ 
ifier,  etc.)  are  largely  preserved  across  the  align¬ 
ment,  as  illustrated  in  Table  1.  Moreover,  the 
DCA  makes  possible  more  elegant  formalisms 
(e.g.  SITG)  and  more  efficient  algorithms.  It 
may  allow  us  to  use  the  syntactic  analysis  for 
one  language  to  infer  annotations  for  the  corre¬ 
sponding  sentence  in  another  language,  helping 
to  reduce  the  labor  and  expense  of  creating  tree- 
banks  in  new  languages  (Cabezas  et  al.,  2001; 
Yarowsky  and  Ngai,  2001). 

Unfortunately,  the  DCA  is  flawed,  even  for 
literal  translations.  For  example,  in  sentence 
pair  (1),  the  indirect  object  of  the  verb  is  ex¬ 
pressed  in  English  using  a  prepositional  phrase 
(headed  by  the  word  for)  that  attaches  to 
the  verb,  but  it  is  expressed  with  the  dative 
case  marking  on  anaiari  (brother-dat)  in 
Basque.  If  we  aligned  both  for  and  brother 
to  anaiari,  then  a  many-to-one  mapping  would 
be  formed,  and  the  DCA  would  be  violated: 
R(for,  brother)  holds  in  the  English  tree  but 
R(anaiari,  anaiari)  does  not  hold  in  the  Basque 
tree.  Similarly,  a  one-to-many  mapping  (e.g., 
aligning  got  with  erosi  (buy)  and  nion  (past) 
in  this  example)  can  also  be  problematic  for  the 
DCA. 

The  inadequacy  of  the  DCA  should  come  as 
no  surprise.  The  syntax  literature  dating  back 


to  Chomsky  (1981),  together  with  a  rich  com¬ 
putational  literature  on  translation  divergences 
(e.g.  (Abeille  et  al.,  1990;  Dorr,  1994;  Han 
et  al.,  2000)),  is  concerned  with  characterizing 
in  a  systematic  way  the  apparent  diversity  of 
mechanisms  used  by  languages  to  express  mean¬ 
ings  syntactically.  For  example,  current  theo¬ 
ries  claim  that  languages  employ  stable  head- 
complement  orders  across  construction  types.  In 
English,  the  head  of  a  phrase  is  uniformly  to  the 
left  of  modifying  prepositional  phrases,  senten¬ 
tial  complements,  etc.  In  Chinese,  verbal  and 
prepositional  phrases  respect  the  English  order¬ 
ing  but  heads  in  the  nominal  system  uniformly 
appear  to  the  right.  Systematic  application  of 
this  sort  of  linguistic  knowledge  turns  out  to  be 
the  key  in  getting  beyond  the  DCA’s  limitations. 

3  Evaluating  the  DCA  using 
Annotation  Projection 

Thus  far,  we  have  argued  that  the  DCA  is  a  use¬ 
ful  and  widely  assumed  principle;  at  the  same 
time  we  have  illustrated  that  it  is  incapable  of 
accounting  for  some  well  known  and  fundamen¬ 
tal  linguistic  facts.  Yet  this  is  not  an  unfamil¬ 
iar  situation.  For  years,  stochastic  modeling  of 
language  has  depended  on  the  linguistically  im¬ 
plausible  assumptions  underlying  n-grarn  mod¬ 
els,  hidden  Markov  models,  context-free  gram¬ 
mars,  and  the  like,  with  remarkable  success. 
Having  made  the  DCA  explicit,  we  would  sug¬ 
gest  that  the  right  questions  are:  to  what  extent 
is  it  true,  and  how  useful  is  it  when  it  holds? 

In  the  remainder  of  the  paper,  we  focus  on  an¬ 
swering  the  first  question  empirically  by  consid¬ 
ering  the  syntactic  relationships  and  alignments 
between  translated  sentence  pairs  in  two  distant 
languages  (English  and  Chinese) .  In  our  experi¬ 
mental  framework,  a  system  is  given  the  “ideal” 
syntactic  analyses  for  the  English  sentences  and 
English-Chinese  word-alignments,  and  it  uses  a 
Direct  Projection  Algorithm  (described  below) 
to  project  the  English  syntactic  annotations  di¬ 
rectly  across  to  the  Chinese  sentences  in  accor¬ 
dance  with  the  DCA.  The  resulting  Chinese  de¬ 
pendency  analyses  are  then  compared  with  an 
independently  derived  gold  standard,  enabling 


us  to  determine  recall  and  precision  figures  for 
syntactic  dependencies  (cf.  (Lin,  1998))  and  to 
perform  a  qualitative  error  analysis.  This  error 
analysis  led  us  to  revise  our  projection  approach, 
and  the  resulting  linguistically  informed  projec¬ 
tion  improved  significantly  the  ability  to  obtain 
accurate  Chinese  parses. 

This  experimental  framework  for  the  first 
question  is  designed  with  an  eye  toward  the  sec¬ 
ond,  concerning  the  usefulness  of  making  the 
direct  correspondence  assumption.  If  the  DCA 
holds  true  more  often  than  not,  then  one  might 
speculate  that  the  projected  syntactic  structures 
could  be  useful  as  a  treebank  (albeit  a  noisy 
one)  for  training  Chinese  parsers,  and  could 
help  more  generally  in  overcoming  the  syntactic 
annotation  bottleneck  for  languages  other  than 
English. 

3.1  The  Direct  Projection  Algorithm 

The  DCA  translates  fairly  directly  into  an  algo¬ 
rithm  for  projecting  English  dependency  analy¬ 
ses  across  to  Chinese  using  word  alignments  as 
the  bridge.  More  formally,  given  sentence  pair 
(A,  F ),  the  English  syntactic  relations  are  pro¬ 
jected  for  the  following  situations: 

•  one-to-one  if  h-E  £  A  is  aligned  with  a 
unique  hp  £  F  and  the  is  aligned  with  a 
unique  vif  £  F,  then  if  A(/i£,  rug),  con¬ 
clude  R{hF,  vif). 

•  unaligned  (English)  if  we  £  A  is  not 

aligned  with  any  word  in  A,  then  create  a 
new  empty  word  uf  €  F  such  that  for  any 
xe  aligned  with  a  unique  xf,  R(xe,u>e )  =>• 
R(xF,np)  and  R(we,xe )  =>•  R(tif,xf)- 

•  one-to-many  if  we  £  A  is  aligned  with 
wiF,...,wnp,  then  create  a  new  empty 
word  rriF  £  A  such  that  is  the  parent 
of  wiF, . . .  ,wnF  and  set  we  to  align  to  rrip 
instead. 

•  many-to-one  if  w\E , . . . ,  wnE  £  A  are  all 
uniquely  aligned  to  wp  £  A,  then  delete  all 
alignments  between  WiE(  1  <  i  <  n)  and  vjf 
except  for  the  head  (denoted  as  wpE ) ;  more¬ 
over,  if  WiE,  a  modifier  of  wpE,  had  its  own 
modifiers,  R(wiE,wjE )  =f>  R(whF,wjF). 


The  many-to-many  case  is  decomposed  into 
a  two-step  process:  first  perform  one-to-many, 
then  perform  many-to-one.  In  the  cases  of  un¬ 
aligned  Chinese  words,  they  are  left  out  of  the 
projected  syntactic  tree.  The  asymmetry  in  the 
treatment  of  one-to-many  and  many-to-one 
and  of  the  unaligned  words  for  the  two  languages 
arises  from  the  asymmetric  nature  of  the  projec¬ 
tion. 

3.2  Experimental  Setup 

The  corpus  for  this  experiment  was  constructed 
by  obtaining  manual  English  translations  for 
124  Chinese  newswire  sentences  (with  40  words 
or  less)  contained  in  sections  001-015  of  the  Penn 
Chinese  Treebank  (Xia  et  al.,  2000).  The  Chi¬ 
nese  data  in  our  set  ranged  from  simple  sen¬ 
tences  to  some  complicated  constructions  such 
as  complex  relative  clauses,  multiple  run-on 
clauses,  embeddings,  nominal  constructions,  etc. 
Average  sentence  length  was  23.7  words. 

Parses  for  the  English  sentences  were  con¬ 
structed  by  a  process  of  automatic  analy¬ 
sis  followed  by  hand  correction;  output  trees 
from  a  broad-coverage  lexicalized  English  parser 
(Collins,  1997)  were  automatically  converted 
into  dependencies  to  be  corrected.  The  gold- 
standard  dependency  analyses  for  the  Chinese 
sentences  were  constructed  manually  by  two  flu¬ 
ent  speakers  of  Chinese,  working  independently 
and  using  the  Chinese  Treebank’s  (manually 
constructed)  constituency  parses  for  reference.4 
Inter-annotator  agreement  on  unlabeled  syntac¬ 
tic  dependencies  is  92.4%.  Manual  English- 
Chinese  alignments  were  constructed  by  two  an¬ 
notators  who  are  native  speakers  of  Chinese  us¬ 
ing  a  software  environment  similar  to  that  de¬ 
scribed  by  Melamed  (1998). 

The  direct  projection  of  English  dependen¬ 
cies  to  Chinese  yielded  poor  results  as  measured 
by  precision  and  recall  over  unlabeled  syntactic 
dependencies:  precision  was  30.1%  and  recall 
39.1%.  Inspection  of  the  results  revealed  that 
our  manually  aligned  parallel  corpus  contained 
many  instances  of  multiply  aligned  or  unaligned 
tokens,  owing  either  to  freeness  of  translation 

4One  author  of  this  paper  served  as  one  of  the  anno¬ 
tators. 


(a  violation  of  the  assumption  that  translations 
are  literal)  or  to  differences  in  how  the  two  lan¬ 
guages  express  the  same  meaning.  For  example, 
to  quantify  a  Chinese  noun  with  a  determiner, 
one  also  needs  to  supply  a  measure  word  in  ad¬ 
dition  to  the  quantity.  Thus,  the  noun  phrase 
an  apple  is  expressed  as  yee  (an)  ge  (-meas) 
ping-guo  (apple).  Chinese  also  includes  sepa¬ 
rate  words  to  indicate  aspectual  categories  such 
as  continued  action,  in  contrast  to  verbal  suf¬ 
fixes  in  English  such  as  the  -ing  in  running. 
Because  Chinese  classifiers,  aspectual  particles, 
and  other  functional  words  do  not  appear  in  the 
English  sentence,  there  is  no  way  for  a  projected 
English  analysis  to  correctly  account  for  them. 
As  a  result,  the  Chinese  dependency  trees  usu¬ 
ally  fail  to  contain  an  appropriate  grammatical 
relation  for  these  items.  Because  they  are  fre¬ 
quent,  the  failure  to  properly  account  for  them 
significantly  hurts  performance. 

3.3  Revised  Projection 

Our  error  analysis  led  to  the  conclusion  that  the 
correspondence  of  syntactic  relationships  would 
be  improved  by  a  better  handling  of  the  one-to- 
many  mappings  and  the  unaligned  cases.  We 
investigated  two  ways  of  addressing  this  issue. 

First,  we  adopted  a  simple  strategy  informed 
by  the  tendency  of  languages  to  have  a  consis¬ 
tent  direction  for  “headedness” .  Chinese  and 
English  share  the  property  that  they  are  head- 
initial  for  most  phrase  types.  Thus,  if  an  English 
word  aligns  to  multiple  Chinese  words  ci, . . . ,  cn, 
the  leftmost  word  ci  is  treated  as  the  head  and 
C'2,  ■■■  ■  cn  are  analyzed  as  its  dependents.  If 
a  Chinese  empty  node  was  introduced  to  align 
with  an  untranslated  English  word,  it  is  deleted 
and  its  left-most  child  is  promoted  to  replace  it. 
Looking  at  language  in  this  non-construction- 
dependent  way  allows  us  to  make  simple  changes 
that  have  wide  ranging  effects.  This  is  illustra¬ 
tive  of  how  our  approach  tries  to  rein  in  cases 
where  the  DCA  breaks  down  by  using  linguisti¬ 
cally  informed  constraints  that  are  as  general  as 
possible. 

Second,  we  used  more  detailed  linguistic 
knowledge  of  Chinese  to  develop  a  small  set  of 
rules,  expressed  in  a  tree-based  pattern- action 


formalism,  that  perform  local  modifications  of  a 
projected  analysis  on  the  Chinese  side.  To  avoid 
the  slippery  slope  of  unending  language-specific 
rule  tweaking,  we  strictly  constrained  the  possi¬ 
ble  rules.  Rules  were  permitted  to  refer  only  to 
closed  class  items,  to  parts  of  speech  projected 
from  the  English  analysis,  or  to  easily  enumer¬ 
ated  lexical  categories  (e.g.  {dollar,  RMB,  $, 
yen}). 

For  example,  one  such  rule  deals  with  noun 
modification: 

•  If  n\, ...  ,nk  are  a  set  of  Chinese  words 
aligned  to  an  English  noun,  replace  the 
empty  node  introduced  in  the  Direct  Pro¬ 
jection  Algorithm  by  promoting  the  last 
word  nk  to  its  place  with  m, . . . ,  n*,_i  as 
dependents. 

Another  deals  with  aspectual  markers  for  verbs: 

•  If  v\,. . .  ,Vk,  a  sequence  of  Chinese  words 
aligned  with  English  verbs,  is  followed  by 
a,  an  aspect  marker,  make  a  into  a  modifier 
of  the  last  verb  v^. 

The  most  involved  transformation  places  a  lin¬ 
guistic  constraint  on  the  Chinese  functional 
word  de,  which  may  be  translated  as  that  (the 
head  of  a  relative  clause) ,  as  the  preposition  of, 
or  as  ’s  (a  marker  for  possessives).  This  com¬ 
mon  Chinese  functional  word  is  almost  always 
either  unaligned  or  multiply  aligned  to  an  En¬ 
glish  word. 

•  If  Ci  is  the  Chinese  word  that  appeared  im¬ 
mediately  to  the  left  of  de  and  cj  is  the  Chi¬ 
nese  word  that  appeared  immediately  to  the 
right  of  it,  then  find  the  lowest  ancestors  cp 
and  Cg  for  ct  and  Cj,  respectively,  such  that 
R(cp,  Cg)  exists;  remove  that  relationship; 
and  replace  it  with  R(de,cp)  and  R{cq,de). 

The  latter  two  changes  may  seem  unrelated, 
but  they  both  take  advantage  of  the  fact  that 
Chinese  violates  the  head-initial  rule  in  its  nom¬ 
inal  system,  where  noun  phrases  are  uniformly 
head-final.  More  generally,  the  majority  of  rule 
patterns  are  variations  on  the  same  solution  to 
the  same  problem.  Viewing  the  problem  from 


a  higher  level  of  linguistic  abstraction  made  it 
possible  to  find  all  the  relevant  cases  in  a  short 
time  (a  few  days)  and  express  the  solution  com¬ 
pactly  (<  20  rules).  The  complete  set  of  rules 
can  be  found  in  (Hwa  et  al.,  2002). 

3.4  A  New  Experiment 

Because  our  error  analysis  and  subsequent  al¬ 
gorithm  refinements  made  use  of  our  original 
Chinese-English  data  set,  we  created  a  new  test 
set  based  on  88  new  Chinese  sentences  from 
the  Penn  Chinese  Treebank,  already  manually 
translated  into  English  as  part  of  the  NIST  MT 
evaluation  preview.5  These  sentences  averaged 
19.0  words  in  length. 

As  described  above,  parses  on  the  English 
side  were  created  semi-automatically,  and  word 
alignments  were  acquired  manually.  However,  in 
order  to  reduce  our  reliance  on  linguistically  so¬ 
phisticated  human  annotators  for  Chinese  syn¬ 
tax,  we  adopted  an  alternative  strategy  for  ob¬ 
taining  the  gold  standard:  we  automatically 
converted  the  Treebank’s  constituency  parses  of 
the  Chinese  sentences  into  syntactic  dependency 
representations,  using  an  algorithm  similar  to 
the  one  described  in  Section  2  of  the  paper  by 
Xia  and  Palmer  (2001). 6 

The  recall  and  precision  figures  for  the  new  ex¬ 
periment  are  summarized  in  Table  2.  The  first 
row  of  the  table  shows  the  results  comparing  the 
output  of  the  Direct  Projection  Algorithm  with 
the  gold  standard.  As  we  have  already  seen  pre¬ 
viously,  the  quality  of  these  trees  is  not  very 
good.  The  second  row  of  the  table  shows  that  af¬ 
ter  applying  the  single  transformation  based  on 
the  head-initial  assumption,  precision  and  recall 
both  improve  significantly:  using  the  F-measure 
to  combine  precision  and  recall  into  a  single  fig¬ 
ure  of  merit  (Van  Rijsbergen,  1979),  the  increase 

5See  http://www.nist.gov/speech/tests/mt/.  We 
used  sentences  from  sections  038,  039,  067,  122,  191,  207, 
249  because,  according  to  the  distributor,  the  translation 
of  these  sections  (files  with  .spc  suffix)  have  been  more 
carefully  verified. 

6The  strategy  was  validated  by  performing  the  same 
process  on  the  original  data  set;  the  agreement  rate  with 
the  human-generated  dependency  trees  was  97.5%.  This 
led  us  to  be  confident  that  Treebank  constituency  parses 
could  be  used  automatically  to  create  a  gold  standard  for 
syntactic  dependencies. 


Method 

Precision 

Recall 

F-measure 

Direct 

34.5 

42.5 

38.1 

Head-initial 

59.4 

59.4 

59.4 

Rules 

68.0 

66.6 

67.3 

Table  2:  Performance  on  Chinese  analyses  (%) 

from  38.1%  to  59.4%  represents  a  55.9%  relative 
improvement.  The  third  row  of  the  table  shows 
that  by  applying  the  small  set  of  tree  modifica¬ 
tion  rules  after  direct  projection  (one  of  which 
is  default  assignment  of  the  head-initial  analysis 
to  multi-word  phrases  when  no  other  rule  ap¬ 
plies),  we  obtain  an  even  larger  improvement, 
the  67.3%  F-measure  representing  a  76.6%  rela¬ 
tive  gain  over  baseline  performance. 

4  Conclusions  and  Future  Work 

To  what  extent  is  the  DCA  a  valid  assumption? 
Our  experiments  confirm  the  linguistic  intuition, 
indicating  that  one  cannot  safely  assume  a  direct 
mapping  between  the  syntactic  dependencies  of 
one  language  and  the  syntactic  dependencies  of 
another. 

How  useful  is  the  DCA?  The  experimental  re¬ 
sults  show  that  even  the  simplistic  DCA  can 
be  useful  when  operating  in  conjunction  with 
small  quantities  of  systematic  linguistic  knowl¬ 
edge.  Syntactic  analyses  projected  from  English 
to  Chinese  can,  in  principle,  yield  Chinese  analy¬ 
ses  that  are  nearly  70%  accurate  (in  terms  of  un¬ 
labeled  dependencies)  after  application  of  a  set 
of  linguistically  principled  rules.  In  the  near  fu¬ 
ture  we  will  address  the  remaining  errors,  which 
also  seem  to  be  amenable  to  a  uniform  linguis¬ 
tic  treatment:  in  large  part  they  involve  differ¬ 
ences  in  category  expression  (nominal  expres¬ 
sions  translated  as  verbs  or  vice  versa)  and  we 
believe  that  we  can  use  context  to  effect  the  cor¬ 
rect  category  transformations.  We  will  also  ex¬ 
plore  correction  of  errors  via  statistical  learning 
techniques. 

The  implication  of  this  work  for  statistical 
translation  modeling  is  that  a  little  bit  of  knowl¬ 
edge  can  be  a  good  thing.  The  approach  de¬ 
scribed  here  strikes  a  balance  somewhere  be¬ 
tween  the  endless  construction-by-construction 
tuning  of  rule-based  approaches,  on  the  one 


hand,  and,  on  the  other,  the  development  of  in¬ 
sufficiently  constrained  stochastic  models. 

We  have  systematically  diagnosed  a  common 
assumption  that  has  been  dealt  with  previously 
on  a  case  by  case  basis,  but  not  named.  Most 
of  the  models  we  know  of  —  from  early  work  at 
IBM  to  second-generation  models  such  as  that  of 
Knight  and  Yarnada  —  rectify  glaring  problems 
caused  by  the  failure  of  the  DCA  using  a  range 
of  pre-  or  post-processing  techniques. 

We  have  identified  the  source  for  a  host  of 
these  problems  and  have  suggested  diagnostics 
for  future  cases  where  we  might  expect  these 
problems  to  arise.  More  important,  we  have 
shown  that  linguistically  informed  strategies  can 
be  developed  efficiently  to  improve  output  that 
is  otherwise  compromised  by  situations  where 
the  DCA  does  not  hold. 

In  addition  to  resolving  the  remaining  prob¬ 
lematic  cases  for  our  projection  framework,  we 
are  exploring  ways  to  automatically  create  large 
quantities  of  syntactically  annotated  data.  This 
will  break  the  bottleneck  in  developing  appro¬ 
priately  annotated  training  corpora.  Currently, 
we  are  following  two  research  directions.  Our 
first  goal  is  to  minimize  the  degree  of  degrada¬ 
tion  in  the  quality  of  the  projected  trees  when 
the  input  analyses  and  word  alignments  are  au¬ 
tomatically  generated  by  a  statistical  parser  and 
word  alignment  model.  To  improve  the  quality 
of  the  input  analyses,  we  are  adapting  active 
learning  and  co-training  techniques  (Hwa,  2000; 
Sarkar,  2001)  to  exploit  the  most  reliable  data. 
We  are  also  actively  developing  an  alternative 
alignment  model  that  makes  more  use  of  the 
syntactic  structure  (Lopez  et  al.,  2002).  Our 
second  goal  is  to  detect  and  reduce  the  noise 
in  the  projected  trees  so  that  they  might  re¬ 
place  the  expensive  human-annotated  corpora 
as  training  examples  for  statistical  parsers.  We 
are  investigating  the  use  of  filtering  strategies  to 
localize  the  potentially  problematic  parts  of  the 
projected  syntactic  trees. 
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